%/* ----------------------------------------------------------- */
%/*                                                             */
%/*                          ___                                */
%/*                       |_| | |_/   SPEECH                    */
%/*                       | | | | \   RECOGNITION               */
%/*                       =========   SOFTWARE                  */ 
%/*                                                             */
%/*                                                             */
%/* ----------------------------------------------------------- */
%/*         Copyright: Microsoft Corporation                    */
%/*          1995-2000 Redmond, Washington USA                  */
%/*                    http://www.microsoft.com                */
%/*                                                             */
%/*   Use of this software is governed by a License Agreement   */
%/*    ** See the file License for the Conditions of Use  **    */
%/*    **     This banner notice must not be removed      **    */
%/*                                                             */
%/* ----------------------------------------------------------- */
%
% HTKBook - Steve Young  24/11/97
%

\newpage
\mysect{HBuild}{HBuild}

\mysubsect{Function}{HBuild-Function}

\index{hbuild@\htool{HBuild}|(}
This program is used to convert input files that represent language
models in a number of different formats and output a standard
\HTK\ lattice. The main purpose of \htool{HBuild} is to allow the
expansion of \HTK\ multi-level lattices and the conversion of
bigram language models (such as those generated by \htool{HLStats})
into lattice format. 

The specific input file types supported by \htool{HBuild} are:
\begin{enumerate}
\item \HTK\ multi-level lattice files.
\item Back-off bigram files in ARPA/MIT-LL format.
\item Matrix bigram files produced by \htool{HLStats}.
\item Word lists (to generate a word-loop grammar).
\item Word-pair grammars in ARPA Resource Management format.
\end{enumerate}

The formats of both types of bigram supported by \htool{HBuild} 
are described in Chapter~\ref{c:netdict}. The format for multi-level
\HTK\ lattice files is described in Chapter~\ref{c:htkslf}.

\mysubsect{Use}{HBuild-Use}

\htool{HBuild} is invoked by the command line
\begin{verbatim}
   HBuild [options] wordList outLatFile
\end{verbatim}
The {\tt wordList} should contain a list of all the words used
in the input language model. The options specify the type of input
language model as well as the source filename. If none of the flags
specifying input language model type are given a simple word-loop
is generated using the {\tt wordList} given. After processing the
input language model, the resulting lattice
is saved to file {\tt outLatFile}.

The operation of \htool{HBuild} is controlled by the following
command line options
\begin{optlist}
  \ttitem{-b} Output the lattice in binary format. This increases
              speed of subsequent loading (default ASCII text lattices).

  \ttitem{-m fn} The matrix format bigram in {\tt fn} forms the input
              language model.

  \ttitem{-n fn} The ARPA/MIT-LL format back-off bigram in {\tt fn} 
              forms the input language model.

  \ttitem{-s st en} Set the bigram entry and exit words to {\tt st} 
        and {\tt en}.  (Default {\tt !ENTER} and {\tt !EXIT}).
        Note that no words will follow the exit word, or precede
        the entry word. Both the entry and exit word must be included
        in the {\tt wordList}. This option is only effective in conjunction
          with the \texttt{-n} option.

  \ttitem{-t st en} This option is used with word-loops and word-pair 
        grammars.
        An output lattice is produced with an initial word-symbol
        {\tt st} (before the loop) and a final word-symbol {\tt en}
        (after the loop). This allows initial and final silences
        to be specified. (Default is that the initial and final nodes
        are labelled with {\tt !NULL}). Note that {\tt st} and {\tt en} 
        shouldn't be included in the {\tt wordList} unless they occur 
        elsewhere in the network. This is only effective for word-loop and
          word-pair grammars.

  \ttitem{-u s} The unknown word is {\tt s} (default !NULL). This
         option only has an effect when bigram input language models 
         are specified. It can be used in conjunction with the {\tt -z}
         flag to delete the symbol for unknown words from the output
         lattice.

  \ttitem{-w fn} The word-pair grammar in {\tt fn} 
              forms the input language model. The file must be in
         the format used for the ARPA Resource Management grammar.

  \ttitem{-x fn} The extended HTK lattice in {\tt fn} 
              forms the input language model. This option is
              used to expand a multi-level lattice into a single
              level lattice that can be processed by other \HTK\ tools.

  \ttitem{-z} Delete (zap) any references to the unknown word (see {\tt -u} 
              option) in the output lattice.

\end{optlist}
\stdopts{HBuild}

\mysubsect{Tracing}{HBuild-Tracing}

\htool{HBuild} supports the following trace options where each
trace flag is given using an octal base
\begin{optlist}
   \ttitem{0001} basic progress reporting.
\end{optlist}
Trace flags are set using the \texttt{-T} option or the  \texttt{TRACE} 
configuration variable.
\index{hbuild@\htool{HBuild}|)}


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "../htkbook"
%%% End: 
